Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation

نویسندگان

  • Amine Heddad
  • Markus Brameier
  • Robert M. MacCallum
چکیده

A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to alreadyknown NLSs. It therefore has the potential to discover new NLS motifs. We apply both treebased and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually “NLS-like” and work is underway to analyse these for novelty.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of 8-Weeks of Low-Intensity Swimming Training on Promyelocytic Leukemia Zinc Finger Protein and Spermatid Transition Nuclear Protein Gene Expression in Azoospermic Rats Model

Aims: One of the causes of infertility in men is the azoospermia disease, which is attributed to the lack of sperm in each sperm. The primary function of spermatogenesis is the maintenance, proliferation, and differentiation of spermatogonial cells. Thus, the present study aimed to investigate the changes in Promyelocytic Leukemia Zinc Finger (PLZF) and spermatid Transition Nuclear Protein (TNP...

متن کامل

Evolving Classifiers for Protein Nuclear Localisation using Genetic Programming

Being able to predict the location of a protein in the cell is one of the steps toward knowing its role and activity. With that information one could also conclude possible effects on the organism carrying those proteins. The number of putative, unclassified proteins is constantly growing due to the continuous genome sequencing projects. Hence, the need for fast and cheap methods to classify pr...

متن کامل

NucPred - Predicting nuclear localization of proteins

UNLABELLED NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression...

متن کامل

Cloning, Expression, Purification and Immunoreactivity Analysis of Gag Derived Protein p17 from HIV-1 CRF35 in Fusion with Thioredoxin from Human Subjects

So far, recombinant antigens of HIV-1, the etiologic cause of Acquired Immunodeficiency Syndrome (AIDS), have been widely used for the diagnosis and vaccine development. P17 or the matrix protein formed by the proteolytic cleavage of gag is strongly antigenic and is as conserved and immunogenic as p24. In some cases, antibodies to p17 are more prevalent than antibodies to p24 and the decline in...

متن کامل

Evolving Protein Motifs Using a Stochastic Regular Language with Codon-Level Probabilities

Experiments involving the evolution of protein motifs using genetic programming are presented. The motifs use a stochastic regular expression language that uses codon-level probabilities within conserved sets (masks). Experiments compared basic genetic programming with Lamarckian evolution, as well as the use of “natural” probability distributions for masks obtained from the sequence database. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004